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In this paper we introduce a simple continuous-time asset pricing framework, based on general multi- 
dimensional diffusion processes, that combines semi-analytic pricing with a nonlinear specification for 
the market price of risk. Our framework guarantees existence of weak solutions of the nonlinear SDEs 
■ under the physical measure, thus allowing to work with nonlinear models for the real world dynamics 

not considered in the literature so far. It emerges that the additional flexibility in the time scries 
modelling is econometrically relevant: a nonlinear stochastic volatility diffusion model for the joint 
time series of the S&P 100 and the VXO implied volatility index data shows superior forecasting power 
over the standard specifications for implied and realized variance forecasting. 
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CN '. 1 Introduction 

ON 

o 

Most financial time series exhibit rapid fluctuations while being extremely persistent at the same time. 
Violent fluctuations are often identified as jumps caused by events such as central bank meetings or rating 
, announcements. The economic intuition suggests that for example interest rates should be stationary. 
!> ■ However unit-root tests often imply that interest rates are integrated and therefore exhibit extreme per- 
^ ■ sistence. Ideally a model should be able to accommodate both extremes while maintaining compatibility 
C$ ' with economic theory: random walk like behaviour in a certain region, and reversion towards a mean 
outside it. At first glance establishing the existence of such a model under the real world measure appears 
to be very difficult. A diffusion process with these characteristics would clearly need to exhibit a highly 
nonlinear drift under the physical measure, which implies that global Lipschitz and growth conditions, 
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typically required for the existence of a solution to a multi-dimensional SDE, are not satisfied. In a univari- 
ate diffusion setting Ai't-Sahalia (1996) applies a more general method, only available in dimension one, to 
ascertain the existence of a model that exhibits the desired characteristics. The reason for the econometric 
success of his model lies in the nonlinearity of the drift. Two main obstacles to a wide applicability of such 
models remain. The first is the lack of closed-form, or at least semi-analytic, solutions for the prices of 
contingent claims within the nonlinear framework. The second is a lack of tools for proving the existence 
of solutions to the stochastic differential equations used when attempting to introduce nonlinearity in a 
multivariate setting. 

Employing econometrically inconspicuous dampening functions we introduce a simple multivariate 
diffusion framework which exploits the existence of a solution of an SDE under a risk-neutral probabil- 
ity measure and guarantees the existence of a weak solution of a nonlinear SDE under the real world 
probability measure. From the econometric point of view our framework extends the affine approach 
from Cheridito et al. (2007) yielding substantially enriched dynamics. The most obvious application is 
a state variable formulation that entails (semi-)analytic pricing under the risk-neutral measure, which 
leaves flexibility for the dynamics under the physical measure similar to that of the discrete-time ap- 
proach considered in Dai et al. (2006) and Bertholon et al. (2008). Recent advances in estimating the 
parameters of nonlinear diffusions such as the algorithms introduced in Ai't-Sahalia (2001), Beskos et al. 
(2006) and Mijatovic and Schneider (2009) ensure that reliable parameter inference can be made without 
explicit formulae for transition densities. An empirical application based on the joint time series of the 
S&P 100 and the VXO implied volatility indices reveals that our framework offers statistically significant 
advantages out of sample over extant model specifications in predicting implied as well as realized variance 
over several forecasting horizons. Furthermore we find that the size and sign of the variance risk premia 
implied by our model coincide with the model- independent results in Carr and Wu (2008). 

The paper is organized as follows. In Section 2 we describe the main theoretical construction (see 
Theorem 1) for the framework we consider. Section 3 describes the nonlinear stochastic volatility model, 
its likelihood function and the estimation algorithm which is used to find the parameter values implied 
by the time series of the S&P 100 and the VXO index data. The empirical results are discussed in 
Subsection 3.5 and can be found in the Appendix. Section 4 concludes the paper. 

2 The modelling framework 

In this section we describe the theoretical basis for the modelling framework used in the present paper. As 
mentioned in the introduction, Theorem 1 allows us to define our model under the pricing measure Q and 
perform Girsanov's measure change to obtain any desired model under the physical measure P in a wide 
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class of ltd processes where the state vector satisfies a possibly nonlinear SDE. Theorem 1 also provides a 
weak solution of this SDE. The central building block of our approach is provided by the following very 
simple observation. 

Theorem 1. Fix a time horizon T > and suppose X = (X t ) te \Q t T] *s an ltd process with state space 
D C R n that satisfies the following SDE under the pricing measure Q 

dX t = Li Q (X t ) dt + Z(X t ) dW®, X = x G 2), (1) 

where the drift is given by the function fi^ : 2) — > R n and = (l / V 7 t ^)tg[o,T] ^ s a standard n-dimensional 
Brownian motion under Q. We further assume that the volatility function £ : 2) — > R nxn satisfies 
| detS(x)| > for all x G 2). Let f : 2) — > R n be any measurable function with coordinates fj : 2) — > R, 
j = 1, . . . , n, and define the function D : 2) — > R_|_ by the formula 



D(x) := exp 



c 



detS(x /l 



where c is some positive constant. Then the function A : 2) — > R + , defined by the formula 

A(x) :=D(x)^- 1 (x)f(x), 
is bounded and the process r] = {rit)t^\o,T\ given by 

th = exp (J A(X s )dW® - ±J A(X S ) T A(X s )ds\ , t<T, 

is a Q-martingale. Then the dynamics of X = (Xt)^^^] under the real world measure P, which is defined 
via the Radon- Nikodym derivative Jjj = tjt, are given by 

dX t = (D(X t )f(X t ) + Li Q (X t ))dt + Z(X t )dW[, X = x , (2) 

where W ¥ = (Wf )t e [o,r] *s a standard n-dimensional Brownian motion under the measure P, defined by 
Wf:=W®-HA{Xs)ds. 

The proof of Theorem 1 follows by construction since the random variable A(Xt) is bounded uniformly 
in t G [0, T]. Therefore the Novikov criterion (see Proposition 1.15 in Chapter VIII of Revuz and Yor 
(1999)) applies and the density process r\ is a true martingale under the pricing measure Q. The other 
statements in Theorem 1 follow from Girsanov's theorem (see Theorems 1.4 and 1.7 in Chapter VIII 
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of Revuz and Yor (1999)). 

The sole purpose of the dampening function D in Theorem 1 is to ensure the existence of the real world 
probability measure P, which is equivalent to the pricing measure Q. Note that the positive constant c in 
the function D can be made arbitrarily small. In the case the volatility function £ and the drift function 
/ are continuous in the state variable x £ Q, the dampening factor D equals one in a finite precision 
environment (i.e. a computer) on an arbitrarily large compact subset of the domain D. As a consequence 
we have a large amount of freedom when specifying the drift function / + fi 1 ^ ~ that can achieve the 
desired drift behaviour of the model under the real world measure P. The key observation here is that the 
constant c in the function D does not need to be estimated. It is enough to know that it exists. This by 
Theorem 1 implies that the solution of SDE (2) also exists and that the corresponding process behaves in 
the desired way under the real world measure. 

It remains to specify a flexible model under the pricing measure Q. We should stress here that 
the only assumption in Theorem 1 on the process X under the pricing measure Q is that it exists and 
satisfies SDE (1) in the theorem. Therefore the specification of the measure Q is in practice informed 
by the analytical tractability of the model in terms of the pricing of derivatives. A common choice in 
the multivariate diffusion setting are affine processes. The existence of this class of models is established 
in Duffle et al. (2003) and the algorithms for the pricing of contingent claims, which rest on the extended 
transform methods, are developed in Duffle et al. (2000). In the application discussed in this paper (see 
Section 3) we shall deviate slightly from the affine class and consider a stochastic volatility model based 
on a GARCH diffusion, which is in the class of polynomial models. The existence of the process is not 
difficult to prove and will be established in Section 3. In this model it is possible to compute analytically 
an approximation for the implied volatility in terms of the model parameters under Q. This feature is 
crucial because the goal is to estimate the risk-neutral and the real world parameters simultaneously. We 
conclude this section with a simple affine example that illustrates the application of Theorem 1. 

Example: Consider the following univariate short rate model 



dr t = (a 



9 - b Q r t ) dt + a^F t dW\ 



■Q 



t ) 



with state space 2) = (0, oo) and 2a^ > a 



2 . Let 




and 
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Under the real world probability measure P the process satisfies the SDE 

dr t = |a Q - b Q r t + e'^^) (a ¥ - a Q - (b F - 6^)r t )| dt + a^F t dW? '. 

Since the domain D of the process (rt)te[o,T\ is the positive real line, we can choose the constant c small 
enough so that for numerical purposes such as estimation we can assume that the dynamics of the process 
is given by 

dr t = (a ¥ -b ¥ r t ) dt + o^dWf. 

3 Application 

In this section we are going to apply Theorem 1 in order to estimate an affine (linear) and a nonlinear 
model on the joint time series of the S&P 100 and VXO implied volatility index. We first describe the 
data set and the two models that will be used for prediction and then discuss in some detail the expected 
maximum likelihood (EML) estimation algorithm, which is used for parameter inference of the nonlinear 
diffusions. Finally we perform a statistical test given in Clark and West (2007) on the estimated models 
with respect to their forecasting power. 

3.1 Data 

Models are estimated using daily S&P 100 log prices and daily VXO implied volatilities. The VXO index 
is defined in terms of the current value of the expected realised variance of S&P 100 over a period of one 
month. The CBOE computes the value of VXO using a carfully designed portfolio of exchange traded call 
and put options on the S&P 100 that expire in one month's time. The algorithm used by CBOE enables 
them to obtain a time series of model independent implied volatility. Figure 1 shows the trajectory of the 
VXO implied volatility index published by the CBOE and the logarithm of the S&P 100 index. 

We partition our data set into two non-overlapping subsets. The in-sample period ranges from 2 
January 1990 until 31 December 1999 and the out- of- sample period lasts from 3 January 2000 until 29 
December 2006. 

3.2 S&P 100 stochastic volatility model 

We choose a stochastic GARCH diffusion variance model for the joint times series of the logarithm of the 
S&P 100 prices and instantaneous variance, which evolves under the pricing measure Q according to the 
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Figure 1: Log of S&P 100 index and VXO: The figure shows the evolution of the logarithm of the S&P 
100 index (left y-axis) and of the implied volatility index VXO (right y-axis). The sample is comprised of an 
in-sample period (shaded in white) and an out-sample period (shaded in grey). 



SDE 

dX t = (r - ±V t ) dt + p^/V t dwY Q + \A - ?y/V t dW t XQ , (3) 
dV t = (b® + bfv t )dt + aV t dW^, (4) 

where W v ® = (W^^tefo.T] an d W x< ^ = (W t i)te\o,T] are two independent standard Brownian motions. 
Note that the risk-neutral drift //^ from Theorem 1 can be expressed as 

It is well known that the SDE in (4) has a solution for all values of b®, b^ and a. Assume that 

6? > 0, (5) 

and note that the comparison theorem for the solutions of SDEs (see Proposition 5.2.18 
in Karatzas and Shreve (1991)), applied to V = (Vt)te[o,T\ an d the geometric Brownian motion that 
solves (4) when b'y = 0, implies that the process V does not leave the interval (0, oo) in finite time. It 
therefore follows that under condition (5) we can define the process X = (^)tg[o,T] as a stochastic integral 
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given by (3). This argument shows that the process (X, V) T with state space S = R x (0, oo) exists under 
the pricing measure Q and that it follows SDE (3)-(4). It is shown in Forman and S0rensen (2008) that 
if in addition we have < 0, then the variance process V is ergodic. 

The power in the volatility function of SDE (4) (i.e. the CEV power) for the GARCH diffusion V is 
equal to one. This is a pragmatic and parsimonious educated guess between the CEV powers of 0.65 in 
Ai't-Sahalia and Kimmel (2007) and 1.33/1.17 in Jones (2003) for similar data sets. More sophisticated 
volatility-of-volatility functions are also possible, e.g. (/?o+A Vt+fii V t )^ 2 , but the existence of solutions 
of such SDEs is more difficult to establish. Here we use the simple GARCH diffusion process (see Nelson 
(2002) for this terminology) because our focus in this paper is on the nonlinear drift specification. 

Under the physical measure P we shall consider a linear and a nonlinear drift specifications. The 
former P-drift is linear in the state variables (LN model) and is given by the formula 

This drift corresponds to a usual market price of risk assumption as stated in Jones (2003) and 
in A'it-Sahalia and Kimmel (2007). In the language of Section 2 the drift fjF can be expressed using 
the function / given by 

The linear model LN will serve as a benchmark for econometric relevance of the nonlinear model (NL 
model) whose drift under the physical measure P is given by 

ybo + b^t + hV? + b 3 /V t J 
The corresponding function / from Section 2 is given as 

I lv\-( «o-r+(«i + i)F, \ 

/NL W> - U - 6? + (t, - 6?) V, + h V? + „ s I V,j ■ <9) 

As described in Section 2 the dampening functions -Dnl, -Dln can be made arbitrarily close to one through 
the choice of the constant c. Hence for numerical purposes and econometric implementation it suffices to 
work directly with the drifts and /x^n given in (6) and (8) respectively. 

For implementation it is convenient to consider the process Y = (it)te[o,T]i given by Y t := 7(Vt), where 
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jXP aVf 



Linear Spec. (6) a, p, bf ao>«i b\ 
Nonlinear Spec. (8) a,p b^,b^ ao>«i 60,^1,62,^3 



Table 1: Parameter sets for the linear and the nonlinear model: the table displays the partition of 
the parameter vector 9 into the following groups: 6" (the parameters that influence the dynamics under both 
the risk-neutral measure Q as well as the physical measure P); 6® (parameters that influences the only the 
risk-neutral dynamics); 9 XF U vr (parameters that appear only under the physical measure P). 

the transformation 7 : (0, 00) — > R of the variance process is defined by the formula 

7" = • ( 10 ) 

a 

The evolution of the process Y under the physical measure P is given by 

dYt = { (&? + 61 V t ) ^ - |j dt + dWUt), (11) 

in linear model (6) and by 



r2 , fe 3 \ 1 CT 



dY t = Ub + W V k + b 2 V t 2 + ^-)—--Ut + dW$(t), (12) 
in nonlinear model (8). 

For estimation purposes we partition the parameter vector 9 into four classes. The first class 9 a 
contains the parameters that influence the dynamics under both the physical measure P and the pricing 
measure Q. The second class contains the parameters that arise only under the pricing measure Q. 
The third set 9 X ^ contains the parameters that influence the dynamics of the process X only under the 
physical measure P and the fourth class 9 VV contains the parameters that arise only under the measure P 
in the SDE for the variance process V . It is clear that we can express 9 = 9 U U 9^ U 9 XV U 9 V¥ and that 
these four classes are pairwise disjoint. 

3.3 Likelihood function 

The instantaneous stochastic variance is a latent variable even though a time series of the implied variance 
is available through the VXO index. Note that the drift of the variance V in our model, given in SDE (4) 
under the pricing measure Q, is affine. Therefore the current price of the variance swap is linear in 
the current value of the variance Vt in our model as the following simple calculation, based on Fubini's 
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theorem, demonstrates 



A * 



i+A 



Vn ds 



A) + B(e^,A)V t , A>0, 



(13) 



where the coefficients A(9^, A) and B(9®, A) are given by 



^ (exp(6?A) - l) , A(9®, A) = -^(1 - B((P, A)). 



6^A 



(14) 



We define IVt as the squared VXO index (described in Section 3.1) observed at time t. It is directly 
related to the expected variance over the period of 22 days (i.e. A = 22/262) by the formula 



A * 



t+A 



V* ds 



(15) 



This approximation is very good and the error stems solely from the fact that the algorithm that computes 
the value of VXO uses finitely many options. 

We now exploit the relationship in (15) to express the log-likelihood function for both the linear and 
the nonlinear model described by the real world drifts given in (6) and (8) respectively and by SDE (3)-(4) 
under the pricing measure Q. By the Markov property property we can in both models decompose the 
logdikelihood into a sum of log-transition densities (for ease of notation we henceforth denote IV^ by IVi 
and by Xi) as follows 



A' 



1{X X , IVx, ...,X N , IV N | X ,IV ,6) = logp /V pQ, IVt | Xi-uIVi 



i-l) 



(16) 



i=i 



where p IV (Xi,IVi | IVi-i, 9) denotes the conditional transition density of the random vector 

(Xti^Vti) . The linear transformation 



IVt 



r) 



Vt B(9^,t) 1 

which follows from (15), implies that we can express the log-likelihood as 

A' 



(17) 



^logp y pQ,^ | X t . 1 ,V i . 1 ,9)-N log B(9 q ,t), 



(18) 



8=1 



where p v (X{, Vi | -2Q_x, Vi-\, 9) denotes the conditional transition density of the random vector (X ti , Vt % 
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given the values of -2Q_i and V%—\- The final change of variable Y% = "fiVt) given in (10) yields the 
log-likelihood which takes the form 

N 



where p(Xi, Y{ | Xj_ l5 l^-i, 9) denotes the conditional transition density of the random vector (X tl , Y ti ) . 
3.4 Limited information expected maximum likelihood estimation 

The transition densities for the transformed variance processes (11) and (12) that arise in the log- 
likelihood (19) are not available in closed form. To overcome this issue and that of the suppos- 
edly flat likelihood function we apply expected maximum likelihood (EML) estimation algorithm 
from Mijatovic and Schneider (2009). This technique makes use of the closeness of the law of the Brownian 
bridge to the true law of the diffusion bridge, and of the Euler scheme approximation for the transition 
density when the time interval between observations is small. However, the EML algorithm cannot be 
directly applied to the present econometric problem, as it only works for the estimation of one-dimensional 
diffusions. We therefore propose an efficient three-step limited-information maximum likelihood procedure, 
described below. The efficiency of the algorithm arises from the fact that EML can be used to express the 
globally optimal drift parameters VP * and Q x ^* (optimal parameters are denoted with a superscript *) as 
complicated, yet closed- form functions of the parameters 9 a U 9® and the data. In other words, for fixed 
values 6 a U 9^ the data implies optimal parameter values Q VP * and 9 X ^* . The EML algorithm therefore 
effectively reduces the parameter space from 9 XV U 9 VV U a U 9^ to 9 a U 9®. As a result a conventional 
likelihood search using standard optimization techniques over equation (19) is necessary only for 9° U 0®. 

To make our processes suitable for EML estimation we first introduce M — 1 auxiliary data 
Ui i, . . . , U^m-i between each observed data pair (JQ, Yi) T , (Xj + i, YJ_|_i) T with the convention that 
Uifi '■= Ui := (Xi,Yi) T , and U^m '■= ^i+i := (^i+i)^+i) T - This augmentation leads to a total of 
MN + 1 data pairs. To lighten notation we switch for the below equations to a single-index notation 
Uk, k = 0, . . . j MN. We set 5 := ^ and write down the discretized version of the continuous-time SDE 
eliminating heteroskedasticity in the innovations for the linear variance model (LN) 



W) =^{logppQ,Yi | X i _ 1 ,Y i - 1 ,0)-crY i } - N (log B{0 Q , r) - a) 



(19) 



i=l 



X k+1 - X k - pV^e 



k+l 



(a + ai e aY *) 



1 



5+ e 



X 

fc+1 



(20) 
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and the nonlinear variance model (NL) 



X k+1 -X k - P V^el +l J aY . 1 1 x 

Y k+l - Y k + U= (b + h e° Y * + b 2 e 2 ° Y * + \\ —^-5 + 



(21) 



y 

■fc+i- 



It can be seen that the difference equations (20) and (21) above for both, log stock prices, as well as 
stochastic variance can be written in the form 

gx(U k+1 , U k ) = (tfiUk) + f x (U k )) 5 + e* +1 
g M (Uk+i, U k ) = fi M (U k ) 6 + e% +1 ,M€ {LN, NL} , 

1=0 

where the functions g and / are displayed in tables 2b and 2a. For the linear variance model we have 
£ln = 1) an d for the nonlinear model we have Lnl = 3. Innovations eY and are both identically 
independently N(0, <5)-distributed random variables for k = 1, . . . , MN . Note the appearance of e\ +l 
in equ. (20) and (21), respectively. This is the reason why we need to estimate 9 V]? first (the variance 
dynamics do not depend on the log stock price) and 6 X ^ subsequently, conditional on 9 VV *. Hence the 
terminology 'limited information' EML estimation. For the below algorithm denote with Q vMp ^ M g 
{LN, NL} the parameters of the linear, respectively nonlinear variance process. If there is no ambiguity 
we will just write 9 V]? . 

Optimizing 6 V¥ | 9 a ,9 Q : Plugging the current values of 9 a and 9^ into eq. (15) the observed data 
implies a time series of Y. An estimate of the parameters of the transformed variance process Y can now 
be obtained by means of EML (Mijatovic and Schneider, 2009). For this purpose we introduce functions 
/ and g from difference equations (20) and (21) which are displayed in Table 2. For a given variance 
model Ai we put the variance drift parameters in a vector x M := {b^, • • • , b^ J . EML then yields the 



11 



M = LN M = NL 



frW) l/(ae^) l/(ae^) 
^K) l/(ae 2 ^) 



9M(uk,Uk-i) Vk ~ Vk-i + ~ b o) 5 Vh ~ Vk-i + 2 a5 
(a) Variance drift functions 



i/( v / i -p 2 V^) 

V / e^/ v / l - P 2 



(b) Stock drift functions for model X G {NL, LN} 



Table 2: Function specification for EML estimation: The tables contain the functions that appear as 
summands in the respective drifts of the LN and NL models, which need to be evaluated in the conditional 
expectations in (22) and (23), expressed in terms of the variable Uk := (xk,Vk)- 



optimal drift coefficients 9 



as the unique solution of the linear system x 



M 



{s M y 1 w M with 



N-l M-l 
n=l m=0 



( E oUn , Un+1 [f^(U n , m )f^(U n , m )] 



E 



U n ,U n+ i 
M 



fL M (U n ,m)fo (U n ,m) 



f Liu (Un,m) 



M 



(22) 



M 



N-l M-l 

EE 

n=l m=0 



J n+1 



,771+1 j ^n, m) fo (^n,m)] \ 



g(u n ,771+1) ^n,m) fi, M (U n>m ) 



(23) 



The symbol denotes the (unknown) law of the diffusion bridge pertaining to model A4 conditioned 
on the endpoints x and y, respectively. We approximate the law of the true diffusion bridge with 
the law of a Brownian bridge W^. It is shown in Mijatovic and Schneider (2009) that is absolutely 
continuous with respect to W^, and that there is in fact very little deviation between the two even for 
long time intervals. Exact draws from the Brownian bridge are obtained from the stochastic difference 
equation (Stramer and Yan, 2007) 



TT _ TT , U^M-Um M-m-1 

U-i-l,m+l — Ui-l,m H T7 r \\ ~t £ i— 1,771+1) 



M -m 



M -m 



(24) 
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with e iim ~ N(0, 5), i = 1, . . . , JV - 1, m = 1, . . . , M - 1. 

Optimizing <9 XP | (9 CT , (9 Q , 6> yp *: Conditional on the optimal variance drift parameters QVV* the / and 
g functions (the g function depends on the drift parameters of the variance through e v ) from Table 2a 
can now be swapped with the functions from Table 2b to estimate optimal stock drift parameters Q xv * 
through the solution of the linear system (22) - (23). 

There is no direct EML estimator for cr U 6®. To find 9 a * and 9^* we therefore need to perform a 
conventional likelihood search using likelihood (19) as the objective function. Since for any value of 9 a and 
0® EML yields optimal Q xv * and 6 *, we see likelihood function (19) only as a function of 9 a ,9 l ®, and 
the data. To approximate the unknown transition densities which appear in (19) we use the simulation- 
based estimator from Pedersen (1995) in connection with the Brownian bridge importance sample from 
Durham and Gallant (2002) 



i=l i=l [ ° s =l llm=l q{Ui-l,m | L/i-l.m-l) Ui- hM ) ) 

Here, p M refers to the true transition density arising from Heston dynamics with variance specification (11) 
(LN) and (12) (NL), respectively. The density p EM denotes a normal distribution arising from the Euler 
discretization of the corresponding SDE. Auxiliary state variables Ui—i m , . . . , U%—\,mi i = 1, ■ ■ ■ , N,m = 
1, . . . , M — 1 are simulated according to the stochastic difference equation 



L't-l.m+1 — u i-l,m H 77 r \ — 77 M^i-l,mJ £j_i , m +i, (^b) 

M — m V M — m 



where 



sw-*o = - ^ = Af_ Lm+1 (27) 

V ^ / \ e i-l,m.+l/ 

Both p EM and o are multivariate normal densities: 

q(Ui-l t m+l I Ui-i m, U%-\ M ) = ( ^i-l,m+lj ^i-l,m H 77~ ~ 5 S(j7i_l m )S(CZj_l , m ) T 5 

\ M — m M — m 

P ^ ( u^ M 1 f/,_,, m , »)-♦(( - **• ) ; («+ XT"" ,) * m^ym-^ft 
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Following Stramer and Yan (2007) we set S = M 2 = 576. Note that the e variates appearing in (24) from 
EML estimation may be reused in this step. 

3.5 Empirical results 

We assess the quality of the linear and nonlinear models introduced in Section 3.2 by investigating forecasts 
of realized variance, implied variance and stock returns for various maturities. The forecasting exercise is 
performed both in and out of sample. For the out-of-sample period the model is re-estimated each time a 
new datapoint is added. Figure 1 gives a visual impression of the in-sample as well as the out-of-sample 
period. The corresponding estimation paths for a selection of parameters can be seen in Figures 3a to 3d. 

Point estimates and standard errors for the parameters of the nonlinear model NL (cf. (12)) and the 
linear model LN (cf. (11)) obtained from the limited information EML algorithm described in Section 3.4 
can be found in Table 3. These parameter estimates are based on the entire sample. The Q mean-reversion 
parameter 6^ is large and positive for the linear and the nonlinear model. This is consistent with the 
explosive coefficients estimated in Jones (2003) and Pan (2002) and with the negative variance risk premia 
observed in Carr and Wu (2008). The positive estimates result in a time series for instantaneous variance 
that is located consistently below the time series of implied variance through relation (15) (see Figure 2a). 
The correlation and diffusion parameters p and a as well as are also comparable in scale for both 
specifications. However under the physical measure P the linear and the nonlinear specifications predict 
different behaviour. Figure 2d shows that during calm times (in the region between 0.01 and 0.04) the 
nonlinear specification predicts that the instantaneous variance behaves as a random walk or a process that 
diverts at an even faster rate. Figure 2b suggests that there is a strong pull away from the zero boundary 
and from very high values in the case of the nonlinear drift. Such behaviour cannot be reproduced with 
a linear drift specification (cf. also the drift function estimated from the time series of the VIX index 
in Bandi and Reno (2009), which is of a shape similar to that of the drift function in Figure 2b). 

Recall that the risk premium (i.e. the market price of risk) at time t S [0, T] in the model A4 6 
{LN, NL} is given by 



A M (V t ) = H{V t )- l f M {Vt), where S(Vt) = y/V t V P 1 (28) 

V O-yJVtJ 

and Jm is defined in (7) and (9) for A4 = LN and Ai = NL respectively. The risk premium for the 
stochasticity of the variance is given by the second component A^(Vt) of the market price of risk vector 
A;n(Vt). Note that in the case M = LN, the variance risk premium K^iVt) is a non-zero constant given 
by (b\ — b^)/a. The resulting time series of the risk premia reflect the difference in the estimated real 
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world drifts described in the previous paragraph: while the unconditional mean of the risk premium on 
the W v ® Brownian motion (see SDE (4)) is similar for both specifications, the nonlinear model exhibits 
time- variability in the market prices of risk (see Figure 2c), in contrast to the constant risk premium, 
given by (5i — b±)/a, in the linear model. 

3.5.1 Forecasts 

In this subsection we consider, in addition to the nonlinear model (NL) and the linear model (LN), 
a random walk martingale model (RW), where the prediction for any future value is taken to be the 
current value. The forecasting power of the models is tested with 2395 in-sample, and 1630 out-of-sample 
observations of implied variance forecasts. Each observation of forecast errors is comprised of a cross 
section of residuals pertaining to 1 day, 1 week, 4 weeks, 12 weeks (quarter trading year) and 26 weeks 
(half trading year) forecasting errors for stock returns and implied variance. Realized variance forecast 
errors are computed for horizons of 1 week, 4 weeks, 12 weeks and 26 weeks, where realized variance at 
time U computed over N days is defined as 



j=i-N 

For the model M £ {RW, LN, NL} we compute the realized variance using the model-implied instan- 
taneous variance, which is annualized by construction 



Conditional expectations for the LN and NL models are computed by Monte Carlo integration using 2 • 10 4 
paths with hourly discretization of the SDE. 1 Tables 4, 5 and 6 report the mean absolute error (MAE) and 
the root mean squared error (RMSE) of the sampling distribution of forecasting residuals for the realized 
variance, the implied variance and the stock returns, respectively. In addition directional forecasts as well 
as p-values of the Clark and West (2007) test statistics for nested models are reported. 

Figure 2a indicates structural breaks in the implied variance time series. The in-sample period spans 
these regimes, and the out-of-sample period contains both very rough and very calm periods. Both 
the linear and the nonlinear specification fail to capture the time-dependent long-run mean of the implied 

1 With approximation (15) forecasts for the implied variance can be computed as linear functions of conditional instanta- 
neous variance expectations. Expectations for both the linear and nonlinear model are evaluated by Monte Carlo integration. 
This is despite the availability of an analytic expression for the conditional expectation in the linear model so that both 
specifications are subject to the same simulation error (the same set of random numbers is used for the integration). 




(29) 




(30) 



j=i-N 
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variance. The Clark and West (2007) statistics indicate that both the linear, and the nonlinear model have 
significant advantages over the random walk. These results are in line with the risk premia approach in 
Chernov (2007). The statistics furthermore indicate that the nonlinear model has a statistically (highly) 
significant advantage in forecasting over the linear model. This observation holds for the realized and 
implied variance for all forecasting horizons, both in sample and out of sample. A tremendous improvement 
for the realized variance over the RW specification can be attributed to mean reversion, which the NL and 
the LN model accommodate. The forecast residuals for all three competing models are heavily negatively 
skewed, however, and the distribution is fat-tailed as a consequence of a few heavy outliers. Table 4 
additionally reports normalized MSE (NMSE) for comparison with the results in Sizova (2008). For stock 
returns excellent in-sample results, most likely obtained through explicit modeling of the leverage effect, 
cannot be reproduced out of sample. The likely reason is a change in the drift regime (cf. Figure 1), which 
is not accounted for by the variance modeling. 

The results for the directional forecasts of the realized and implied variance and of stock returns are 
mixed. The NL and LN models show very good results in sample and out of sample for the realized 
variance. For the implied variance the directional forecasts are slightly worse than the ones in Ahoniemi 
(2006), where forecasts are given for one maturity only. Directional out-of-sample forecasts for stock 
returns also suffer from the change in the drift during the out-of-sample period mentioned above (cf. 
Figure 1). 

4 Conclusion 

We introduce a simple continuous-time diffusion framework that combines semi-analytic pricing formulae 
with flexible nonlinear time series modeling. Using an econometrically inconspicuous dampening function 
we ensure that a solution to the nonlinear stochastic differential equation under the physical measure 
exists. We estimate a nonlinear stochastic volatility model on the joint time series of the S&P 100 and 
the VXO implied volatility index. Forecast tests show that the nonlinear model has superior forecasting 
power over the random walk and the linear model for short prediction horizons; the results are statistically 
significantly better both in sample and out of sample. This suggests that a nonlinear specification of the 
drift under the physical measure could potentially be very useful in trading and risk management. 
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Figures and Tables 
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Table 3: Parameter Estimates: The table displays parameter estimates for the linear model (11) and the 
nonlinear model (12). Huber (sandwich) standard errors are computed from the asymptotic covariance matrix 
pertaining to the likelihood in (19) with the transition density approximation in in (25). The asymptotic 
covariance matrix of the estimated parameter vector 8 is computed according to the formula in Hamilton 
(1994, page 145, formula 5.8.7). 
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(c) Variance Premia 
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(d) Estimated Drift (Zoom) 



Figure 2: VXO and instantaneous variance: Figure 2a displays the VXO along with the instantaneous 
variance implied by the Q parameters from Table 3. Figure 2b displays the nonlinear drift at the parameter 
estimates for variance model (8), and the linear drift for model (6). The implied time series for the risk 
premia of the stochastic variance, given by the second coordinate of the market price of risk vector (28), in 
the linear and the nonlinear model is displayed in Figure 2c. 
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Figure 3: Parameter paths: Figure 3 displays estimates for the parameters 6q,6^,ct, and p, respectively, as 
the sample window is updated on a daily basis. For each date on the x-axis model (11) and (12) are 
re-estimated using the EML methodology from Section 3.4. 
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Table 4: Realized variance forecasting: This table displays mean absolute error MAE, given by 

N 1 _ T X^ilr l e i( r )li ro °t mean squared forecast error RMSE, given by N 1 _ T Xl£L T e '( r ) 2 i an d normalized MSE 

(NMSE), defined as e^rf / (j£ =T (.RVi(r) - W^r)) 2 ), where e;(r) := RV,(r) - E r t ._ T [RV^{t)} and 

t € {5, 22, 66, 131}. The realized varieance RVi(r) is defined in (29) and the random variable RVi M (r) is 
given in (30) for any M £ {RW, LN, NL}, where RW denotes the random walk model and LN (resp. NL) 
stands for the linear (resp. nonlinear) model given in (11) (resp. in (12)). DIR shows the percentage of 
correct directional forecasts. CW denotes p-values for the Clark and West (2007) test for nested models. 
Asterisks (***)>(**)>(*) denote significance at the 1%, 5% and 10% confidence level respectively. 
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Table 5: Implied variance forecasting: This table displays mean absolute error MAE, given by 

n 1 _ t X^iii T e *( r )li an d ro °t mean squared forecast error RMSE, denned by \J N \ T 'Yli^S-J e »( r ) 2 ; where 

u(t) := IVt i+r - Ef. [IV/^ T ] and r G {1, 5, 22, 66, 131}. The random variable IV t M is defined as a linear 
transformation, given in (17), of the instantaneous variance in the model A4 G {NL, LN} and IVt denotes the 
square of the VXO index at time t. As in the previous table RW denotes the random walk model and LN 
(resp. NL) stands for the linear (resp. nonlinear) model given in (11) (resp. in (12)). DIR shows the 
percentage of correct directional forecasts. CW denotes p-values for the Clark and West (2007) test for nested 
models. Asterisks (***).(**).(*) denote significance at the 1%, 5% and 10% confidence level respectively. 
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Table 6: Log stock forecasting: This table displays mean absolute error MAE, given by N \ r J^iliT l e »( r )l; 
and root mean squared forecast error RMSE, denned by \J N \ T Tl^S-J ei(r) 2 , where 

ei( r ) : X ti + r — [-^t^+r] ana - r S {1, 5, 22, 66, 131}. The random variable Xt represents the log stock in 
the model M G {NL, LN} and X t denotes the recorded value of the logarithm of the S&P 100 at time t. As 
in the previous table RW denotes the random walk model and LN (resp. NL) stands for the linear (resp. 
nonlinear) model given in (11) (resp. in (12)). DIR shows the percentage of correct directional forecasts. CW 
denotes p-values for the Clark and West (2007) test for nested models. Asterisks (***)>(**).(*) denote 
significance at the 1%, 5% and 10% confidence level respectively. 
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